Learn Apache mod_rewrite: 13 Real-world Examples
This article was written in 2007 and remains one of our most popular posts. If you’re keen to learn more about Apache, you may find this recent article on Apache CloudStack of great interest.
Apache’s low-cost, powerful set of features make it the server of choice for organizations around the world. One of its most valuable treasures is the mod_rewrite module, the purpose of which is to rewrite a visitor’s request URI in the manner specified by a set of rules.
This article will lead you through rewrite rules, regular expressions, and rewrite conditions, and provide a great list of examples.
First off, I’m going to assume that you understand the common reasons for wanting a URI rewriting feature for your web site. If you’d like information about this field, there’s a good primer in the SitePoint article, mod_rewrite: A Beginner’s Guide to URL Rewriting. There, you’ll also find instructions on how to enable it on your own server.
Testing Your Server Setup
Some hosts do not have mod_rewrite enabled (by default it is not enabled). You can find out if your server has mod_rewrite enabled by creating a PHP script with one simple line of PHP code:
phpinfo();
If you load the script with a browser, look in the Apache Modules section. If mod_rewrite isn’t listed there, you’ll have to ask your host to enable it — or find a "good host". Most hosts will have it enabled, so you’ll be good to go.
The Magic of mod_rewrite
Here’s a simple example for you: create three text files named test.html
, test.php
, and .htaccess
.
In the test.html
file, enter the following:
<h1>This is the HTML file.</h1>
In the test.php
file, add this:
<h1>This is the PHP file.</h1>
Create the third file, .htaccess
, with the following:
RewriteEngine on
RewriteRule ^/?test.html$ test.php [L]
Upload all three files (in ASCII mode) to a directory on your server, and type:
http://www.example.com/path/to/test.html
into the location box — using your own domain and directory path of course! If the page shows "This is the PHP file", it’s working properly! If it shows "This is the HTML file," something’s gone wrong.
If your test worked, you’ll notice that the test.html
URI has remained in the browser’s location box, yet we’ve seen the contents of the test.php
file. You’ve just witnessed the magic of mod_rewrite!
mod-rewrite Regular Expressions
Now we can begin rewriting your URIs! Let’s imagine we have a web site that displays city information. The city is selected via the URI like this:
http://www.example.com/display.php?country=USA&state=California&city=San_Diego
Our problem is that this is way too long an unfriendly to users. We’d much prefer it if visitors could use:
http://www.example.com/USA/California/San_Diego
We need to be able to tell Apache to rewrite the latter URI into the former. In order for the display.php
script to read and parse the query string, we’ll need to use regular
expressions to tell mod_rewrite how to match the two URIs. If you’re not
familiar with regular expressions (regex), many sites provide excellent
tutorials. At the end of this article, I’ve listed the best pages I’ve
found on the topic. If you’re not able to follow my explanations, I
recommend reviewing the first two of those links.
A very common approach is to use the expression (.*)
.
This expression combines two metacharacters: the dot character, which
means ANY character, and the asterisk character, which specifies zero or
more of the preceding character. Thus, (.*)
matches everything in the {REQUEST_URI}
string. {REQUEST_URI}
is that part of the URI which follows the domain up to but not including the ?
character of a query string, and is the only Apache variable that a rewrite rule attempts to match.
Wrapping the expression in brackets stores it in an "atom," which is a
variable that allows the matched characters to be reused within the
rule. Thus, the expression above would store USA/California/San_Diego in
the atom. To solve our problem, we’d need three of these atoms,
separated by the subdirectory slashes (/
), so the regex would become:
(.*)/(.*)/(.*)
Given the above expression, the regex engine will match (and save) three values separated by two slashes anywhere in the {REQUEST_URI}
string. To solve our specific problem, though, we’ll need to restrict
this somewhat — after all, the first and last atoms above could match
anything!
To begin with, we can add the start and end anchor characters. The ^
character matches matching characters at the start of a string, and the $
character matches characters at the end of a string.
^(.*)/(.*)/(.*)$
This expression specifies that the whole string must be matched by our regex; there cannot be anything else before or after it.
However, this approach still allows too many matches. We’re storing
our matches as atoms, and will be passing them to a query string, so we
have to be able to trust what we match. Matching anything with (.*)
is too much of a potential security hazard, and, when used
inappropriately, could even cause mod_rewrite to get stuck in a loop!
To avoid unnecessary problems, let’s change the atoms to specify
precisely the characters that we will allow. Because the atoms represent
location names, we should limit the matched characters to upper and
lowercase letters from A to Z, and because we use it to represent spaces
in the name, the underscore character (_
) should also be allowed. We specify a set using square brackets, and a range using the -
character. So the set of allowed characters is written as [a-zA-Z_]
. And because we want to avoid matching blank names, we add the +
metacharacter, which specifies a match only on one or more of the preceding character. Thus, our regex is now:
^([a-zA-Z_]+)/([a-zA-Z_]+)/([a-zA-Z_]+)$
The {REQUEST_URI}
string starts with a /
character. Apache changed regex engines when it changed versions, so
Apache version 1 requires the leading slash while Apache 2 forbids it!
We can satisfy both versions by making the leading slash optional with
the expression ^/?
(?
is the metacharacter for zero or one of the preceding character). So now we have:
^/?([a-zA-Z_]+)/([a-zA-Z_]+)/([a-zA-Z_]+)$
With regex in hand, we can now map the atoms to the query string:
display.php?country=$1&state=$2&city=$3
$1
is the first (country) atom,$2
is the second (state) atom and$3
is the third (city) atom. Note that there can only be nine atoms created, in the order in which the opening brackets appear --$1 ... $9
in a regular expression.We're almost there! Create a new
.htaccess
file with the text:
RewriteRule ^/?([a-zA-Z_]+)/([a-zA-Z_]+)/([a-zA-Z_]+)$ display.php?country=$1&state=$2&city=$3 [L]
Save this to the directory in which display.php
resides. The rewrite rule must go on one line with one space between the RewriteRule
statement, the regex, and the redirection (and before any optional flags). We’ve used the [L]
, or ‘last’ flag, which is the terminating flag (more on flags later).
Our rewrite rule is now complete! The atom values are being extracted
from the request string and added to the query string of our rewritten
URI. The display.php
script will likely extract these values from the query string and use them in a database query or something similar.
If, however, you have only a short list of allowable countries, it might be best to avoid potential database problems by specifying the acceptable values within the regex. Here’s an example:
^/?(USA|Canada|Mexico)/([a-zA-Z_]+)/([a-zA-Z_]+)$
If you’re concerned about capitalization because the values in your
database are strictly lowercase, you can make the regex engine ignore
the case by adding the No Case flag, [NC]
, after the rewritten URI. Just don’t forget to convert the values to lowercase in your script after you obtain the $_GET
array.
If you want to use numbers (0, 1, … 9) for, say, Congressional
Districts, then you’ll need to change an atom’s specification from ([a-zA-Z_]+
) to ([0-9]
) to match a single digit, ([0-9]{1,2}
) to match one or two digits (0 through 99), or ([0-9]+
) for one or more digits, which is useful for database IDs.
The RewriteCond
Statement
Now that you’ve learned how to use mod_rewrite’s basic RewriteRule
statement with the {REQUEST_URI}
string, it’s time to see how we can use conditionals to access other variables with the RewriteCond
statement. The RewriteCond
statement is used to specify the conditions under which a RewriteRule
statement should be applied.
RewriteCond
is similar in format toRewriteRule
in that you have the command name,RewriteCond
, a variable to be matched, the regex, and flags. The logical OR flag,[OR]
, is a useful flag to remember because allRewriteCond
andRewriteRule
statements are inclusive, in the sense of a logical AND relationship, until terminated by the Last,[L]
, flag.You can test many server variables with a
RewriteCond
statement. You can find a list in the SitePoint article I mentioned previously, but this is the best list of server variables I've found.As an example, let's assume that we want to force the www in your domain name. To do this, you'll need to test the Apache
{HTTP_HOST}
variable to see if the www. is already there and, if it's not, redirect to the desired host name:
RewriteCond %{HTTP_HOST} !^www.example.com$ [NC]
RewriteRule .? http://www.example.com%{REQUEST_URI} [R=301,L]
Here, to denote that {HTTP_HOST}
is an Apache variable, we must prepend a %
character to it. The regex begins with the !
character, which will cause the condition to be true if it doesn’t
match the pattern. We also have to escape the dot character so that it
matches a literal dot and not any character, as is the case with the dot
metacharacter. We’ve also added the No Case flag to make this operation
case-insensitive.
The RewriteRule
will match zero or one of any character, and will redirect to http://www.example.com
plus the original {REQUEST_URI}
value. The R=301
,
or redirect, flag will cause Apache to issue a HTTP 301 response, which
indicates that this is a permanent redirection; the Last flag tells
mod_rewrite that you’ve completed this block statement.
RewriteCond
statements can also create atoms, but these are denoted with %1 ... %9
in the same way that RewriteRule
atoms are denoted with $1 ... $9
. You’ll see these atom variables in operation in the examples later on.